Toward a Scoring Function for Quality-Driven Machine Translation

نویسندگان

Douglas A. Jones

Gregory M. Rusk

چکیده

We describe how we constructed an automatic scoring function for machine translation quality; this function makes use of arbitrarily many pieces of natural language processing software that has been designed to process English language text. By machine-learning values of fnnctions available inside the software and by constructing functions that yield values based upon the software output, we are able to achieve preliminary, positive results in machine-learning the difference between human-produced English and machine-translation English. We suggest how the scoring ftmction may be used for MT system development. Introduction to the MT Plateau We believe it is fair to say that the field of machine translation has been on a plateau for at least the past decade. 2 Traditional, band-built MT systems held up very well in the ARPA MT evaluation (White and O'Connell 1994). These systems are relatively expensive to build and generally require a trained staff working for several years to produce a mature system. This is the current commercial state of the art: hand-building specialized lexicons and translation rules. A completely different type of system was competitive in this evaluation, namely, the purely statistical CANDIDE system built at IBM. It was generally felt that this system had also reached a plateau in that more data and more training was not likely to improve the quality of the output. Low Density Machine Translation However, in the case of "Low Density Machine Translation" (see Nirenburg and Raskin 1998, Jones and Havrilla 1998) commercial market forces are not likely to provide significant incentives for machine translation systems for Low Density (Non-Major) languages any time soon. Two noteworthy efforts to break past the data and labor bottlenecks for high-quality machine translation development are the following. The NSF Summer Workshop on i Douglas Jones is now at National Institute of Standards & Technology, Gaithersburg, MD 20899, Douglas.Jones @NIST.gov a A sensible, plateau-fi'iendly strategy may be to accumulate translation memory to improve both the long-term efficiency of human translators and the quality of machine translation systems. If we imagine that the plateau is really a kind of logarithmic function tending ever upwards, we need only be patient. Statistical Machine Translation held at Johns Hopkins University summer 1999 developed a public-domain version intended as a platform for further development of a CANDIDE-style MT system. Part of the goal here is to improve the trauslation by adding levels of linguistic analysis beyond the word N-gram. An effort addressing the labor bottleneck is the Expedition Project at New Mexico State University where a preliminary elicitation environlnent for a computational field linguistics system has been developed (the Boas interface; see Nirenburg and Raskin 1998) A Scoring Function for MT quality Our contribution toward working beyond this plateau is to look for a way to define a scoring function for the quality of the English output such that we can use it to machine-learn a good translation grammar. The novelty of our idea for this function is that we do not have to define the internals of it ourselves per se. We are able to define a successful function for two reasons. First, there is a growing body of software worldwide that has been designed to consume English; all we need is for each piece of software to provide a metric as to how Englishlike its input is. Second, we can tell whether the software had trouble with the input, either by system-internal diagnosis or by diagnosing the software's output. A good illustration is the facility in current word-processing software to put red squiggly lines underneath text it thinks should be revised. We know fi'om experience that this feature is often only annoying. Nevertheless, imagine that it is correct some percentage of the time, and that each piece of software we use for this purpose is correct solne percentage of the time. Our strategy is to

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Accuracy-Based Scoring for Phrase-Based Statistical Machine Translation

Although the scoring features of state-of-theart Phrase-Based Statistical Machine Translation (PB-SMT) models are weighted so as to optimise an objective function measuring translation quality, the estimation of the features themselves does not have any relation to such quality metrics. In this paper, we introduce a translation quality-based feature to PBSMT in a bid to improve the translation ...

متن کامل

Introduction to the MT Plateau

متن کامل

The Correlation of Machine Translation Evaluation Metrics with Human Judgement on Persian Language

Machine Translation Evaluation Metrics (MTEMs) are the central core of Machine Translation (MT) engines as they are developed based on frequent evaluation. Although MTEMs are widespread today, their validity and quality for many languages is still under question. The aim of this research study was to examine the validity and assess the quality of MTEMs from Lexical Similarity set on machine tra...

متن کامل

Paraphrase-Supervised Models of Compositionality

Compositional vector space models of meaning promise new solutions to stubborn language understanding problems. This paper makes two contributions toward this end: (i) it uses automaticallyextracted paraphrase examples as a source of supervision for training compositional models, replacing previous work which relied on manual annotations used for the same purpose, and (ii) develops a contextawa...

متن کامل

Dynamic Models in Moses for Online Adaptation

Avery hot issue for research and industry is how to effectively integratemachine translation (MT)within computer assisted translation (CAT) software. This paper focuses on this issue, and more generally how to dynamically adapt phrase-based statistical machine translation (SMT) by exploiting external knowledge, like the post-editions from professional translators. We present an enhancement of t...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2000

Toward a Scoring Function for Quality-Driven Machine Translation

نویسندگان

چکیده

منابع مشابه

Accuracy-Based Scoring for Phrase-Based Statistical Machine Translation

Introduction to the MT Plateau

The Correlation of Machine Translation Evaluation Metrics with Human Judgement on Persian Language

Paraphrase-Supervised Models of Compositionality

Dynamic Models in Moses for Online Adaptation

عنوان ژورنال:

اشتراک گذاری